SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis

Single-cell CRISPR screens are a promising biotechnology for mapping regulatory elements to target genes at genome-wide scale. However, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two single-cell CRISPR screens how these challenges cause calibration issues. We propose SCEPTRE: analysis of single-cell perturbation screens via conditional resampling, which infers associations between perturbations and expression by resampling the former according to a working model for perturbation detection probability in each cell. SCEPTRE demonstrates very good calibration and sensitivity on CRISPR screen data, yielding hundreds of new regulatory relationships supported by orthogonal biological evidence. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02545-2).


1.
The power of the proposed method should be examined using more real datasets, in addition to simulated data and two public studies interrogated in this study. This approach should be not only applicable to enhancer-gene association but also extended to any perturbationexpression association studies. Compared to enhancer-gene pair that lacks enough validated positive controls, CRISPR-based gene perturbation single cell screens may provide more robust true positive/negative hits to evaluate the power of the analytic approaches. It may be worth checking in these datasets and comparing to other statistic models.
2. Fig. 3a-c, the color legend should be indicated for 3a and 3b, but not just confined to 3c.
3. In Fig. 3b-c, scMAGeCK-LR should be also tested with real data despite its poor performance with simulated data.

4.
Fig. 3e, in addition to this specific positive control hit, how is the general landscape for the positive control pairs using SCEPERE with Xie et al dataset. Fig. 4 and 5, it is insufficient to draw a conclusion by just comparing SCEPERE with original analytic methods in the two public studies. For each dataset, if applying SCEPERE, monocle NB, improved NB, virtual FACS and scMAGeCK-LR methods, would SCEPERE be still the superior or significantly better than others? Similar comparison could be made using more datasets performing either enhancer or gene perturbation single cell CRISPR screens.

6.
If binning the genes by the expression level, how is performance of SCEPERE to pinpoint perturbation-gene pair?
8. Figure S2 and S4 are not cited in the main text.
9. Figure S3, Is there similar analysis for Xie et al data?
Reviewer 2 Were you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? No.
Were you able to directly test the methods? No.

Comments to author:
Barry and his colleagues conducted a method to improve the calibration and sensitivity in single-cell CRISPR screen analysis. They showed the analysis challenges and demonstrated how they address them by conditional resampling. This work would be interesting and important for the future single cell CRISPR screening. However, some specific points below should be addressed before the publication.
1. Could the author use a summary diagram to demonstrate the analysis challenge and how their method address it? 2. The author should be careful about some terminologies. HiC and ChIP-seq are not functional assays.
3. The author should describe more details in the figure legends so that they will be more readable. 4. In the discussion part, the authors applied SCERTRE to interpret GWAS variant. This is an important part to show

Statement of Revision
In the following, we provide a detailed account of the changes that we have made in the revised paper as well as responses to all comments. We have structured this list into blocks, corresponding to the comments made by the Reviewers. The comments are in blue and our responses are in black.  2021)). The statistical challenges at play in the low-MOI setting, while related to those in high MOI setting, are sufficiently distinct that it is most appropriate to defer rigorous analysis of the low MOI setting to a future work. We currently are working on this extension for a followup study that we expect to complete in 6-12 months. We were unable to apply scMAGeCK to the real data. First, we found the documentation of the sc-MAGeCK software package unclear to the point that we were uncertain how to apply the method to new data, despite having examined closely the examples and source code. Second, the authors of the original study deployed scMAGeCK only to a small subset of the gene-gRNA pairs assayed in Gasperini et al. due to the prohibitive computational burden of the method. To the best of our knowledge, scMAGeCK has never been deployed at-scale to the Gasperini et al. data, which would be necessary to meaningfully compare SCEPTRE to scMAGeCK on calibration and sensitivity metrics.

Comments by
The previous version of the manuscript reported the output of scMAGeCK on a simple, simulated dataset consisting of one gene and one negative control gRNA (Figure 3a). We did not apply scMAGeCK "out of the box" to this simulated dataset, because of the limitations of the documentation. Rather, we implemented a custom, in-house version of scMAGeCK that consisted of code snippets from the scMAGeCK codebase. Our custom implementation is a faithful interpretation of the method in the specialized onegene to one-NTC setting. However, our custom implementation does not apply to real data, because the real data are considerably more complex than the simulated data. For example, the real data consist of many genes and gRNAs, and the gRNAs come in different types (e.g., negative control, positive control, enhancer-targeting, etc.), complicating the analysis considerably.
In sum, we cannot apply scMAGeCK to the real data due to documentation and computational challenges. To reduce confusion, we described our challenges with scMAGeCK in the Methods section, and we moved the simulation result to the supplementary materials. Unlike the Gasperini data, the Xie data did not come with any explicit positive control perturbations. In lieu of positive controls, Xie et al. (2019) conducted singleton perturbation screens using a highly sensitive bulk RNA-seq assay. These bulk screens were conducted only for perturbations targeting ARL15-enh and MYB-enh-3. We excluded the latter perturbations from consideration because these were found by Xie et al. to have fitness effects (see Figure S1A of Xie et al. (2019)), a complication beyond the scope of the current work. In short, Figure 3e represents the only positive control validation we could perform on the Xie dataset. We conducted a new analysis in which we applied Monocle NB to the Xie data as well as improved NB to both datasets; Figures 4 and 5 in the revision reflect these changes. Unfortunately, Virtual FACS is not implemented as a publicly available software, so we could not apply it to the Gasperini data. We found that SCEPTRE remained the best method despite the addition of these competitor methods. We updated the text in the section "Analysis of candidate cis-regulatory pairs" to reflect these new analyses. We conducted a new analysis in which we binned candidate gene-enhancer pairs by gene expression level and computed the fraction of pairs rejected in each bin (Tables S1 and S2). We found that SCEPTRE was more likely to reject pairs that contained more highly-expressed genes. We replicated this analysis on other methods (not shown) and observed similar trends. We added a paragraph to the section "Analysis of candidate cis-regulatory pairs" summarizing these results.
[R1: 7] "LINE 253-258, FIGURE 4 IS UNCORRECTED CITED WHICH SHOULD BE FIGURE 5." Thank you for pointing this out; we have now fixed this error. We now cite Figure S2 in lines 161-163: "Finally, we compute a left-, right-, or two-tailed probability of the original z-value under the empirical null distribution, yielding a well-calibrated p-value. This p-value can deviate substantially from that obtained based on the standard normal ( Figure 2, Figure S2)." We now cite Figure S5 (formerly Figure S4) in line 242 and in line 269: "Finally, enhancers discovered by SCEPTRE showed improved enrichment across all eight cell-type relevant ChIP-seq targets reported by Gasperini et al. (Figure 4d, Figure S5a)." "The SCEPTRE discoveries were more biologically plausible: compared to the Virtual FACS pairs, the SCEPTRE pairs were (i) physically closer (Figure 5b), (ii) more likely to fall within the same TAD ( Figure  5c), (iii) more likely to interact when in the same TAD (Figure 5c), and (iv) more enriched for all eight cell-type relevant ChIP-seq targets (Figure 5d and Figure S5b)." [R1: 9] "FIGURE S3, IS THERE SIMILAR ANALYSIS FOR XIE ET AL DATA?" We have added Figure S4, which is the analog of Figure S3 for the Xie data.